Lesson 6: Printer Friendly

Searching With XPath

Printing This Lesson

Select what you’d like to include when you print, and then click the Print Lesson button:

Saving This Lesson

For instructions on saving this lesson (shown below), please select the browser you're using.

chrome icon
Chrome
Firefox icon
Firefox
Internet Explorer 10 icon
IE 11
Safari icon
Safari

Lesson 6 - Searching With XPath - Chapter 1

Introduction

Welcome back! Today's lesson is all about the fundamentals of XPath, XML's query language. You'll see how to move through XML data, searching for a particular element, and then copying or deleting that element.

Good search tools are flexible. They allow you to specify multiple, complex conditions like, "Show me a list of all TV shows after 1975 that include the word Family." And that's what is so cool about XPath—it's a highly flexible search (or query) language. It searches XML documents so you can retrieve and manipulate elements and their content.

So today we're going to explore XPath. You'll see how to use it to find a recipe with a specific title, such as Beef Tips in Wine Sauce. And then, once that recipe is located, you'll learn how to copy and paste it into a listbox—or if the user requests, you'll delete it from the XML document.

You'll also learn how to use two invaluable learning and debugging tools—breakpoints and single-stepping.

Ready? Let's get started!

Chapter 2

Going Down the XPath

You might be surprised to know that you already used XPath! The code in the cookbook program's Listbox_Click event that displays a recipe's instructions contains XPath.

Let's examine that code now. I've highlighted the XPath part:

The user clicks the listbox containing recipe titles. That triggers this sub. The code looks through the recipes.xml document (the doc object in our code) until it locates the recipe containing the title the user clicked. Then, the code fetches that recipe's instructions data and displays them in your textbox named txtInstructions. This way, the user can quickly see any recipe's instructions by just clicking it in the list.

What's happening in this code? It starts off by creating an XmlNode object that I named IndividualRecipe (this object will hold a single recipe element).

Remember

Remember

Recall that in XML, an element can be called a node. So you could say that XPath navigates through an XML document, searching for one or more nodes. Technically, node can also refer to other items in XML, like attributes and comments. And XPath can search for them too.

Next, in the second code line, the SelectSingleNode command locates a single target recipe from among all the recipes in the doc (XML file). The target recipe is the one whose title the user clicked. You use XPath to specify which recipe you're looking for.

The second line of code locates and then copies the target recipe into our IndividualRecipe object.

IndividualRecipe = doc.SelectSingleNode("descendant::recipe[title='" & lstTitles.SelectedItem & "']")

How does it do this?

The doc.SelectSingleNode command uses an XPath expression. Think of an expression as the question you're asking, your query. Let's say the user clicked European Spaghetti Sauce in the list. In that case, the query is: "Please give me a copy of the recipe titled European Spaghetti Sauce."

Let's break down our XPath expression one item at a time:

"descendant::recipe[title='" & lstTitles.SelectedItem & "']"
  • Descendant: This means the same thing here as it means in English—a child, grandchild, great-grandchild, and so on, descending until there are no more descendants left. The doc.SelectSingleNode method starts by default at the root element. Remember that the root element contains all the rest of the elements in the whole document. So all of the elements are the root's descendants. This is just another way of saying, "Search through all the recipes in this XML document." (XPath offers many more search tools than just this descendant command. See this lesson's FAQs.)
  • Recipe: This describes which child elements to look at. In other words, "Only look at elements named recipe."
  • Title: This tells the search to find the recipe with the title that matches what the user clicked (lstTitles.SelectedItem).
XPath expression components
XPath expression components

Understanding Literals

There's one more item in the XPath expression we should examine. How would you describe  lstTitles.SelectedItem? What does it do?

"descendant::recipe[title='" & lstTitles.SelectedItem & "']"

lstTitles.SelectedItem is technically a pair of items ganged together: a control (the lstTitles listbox) and a property of that control (SelectedItem)—the title that the user clicked (causing it to be highlighted, AKA selected).

But informally you can think of lstTitles.SelectedItem as grouped together and behaving like a variable. While the program runs, the user is free to click any title (this makes it the selected item in the list). Properties like size, color, text, margin, SelectedItem, and so on aren't fixed. While the program is running, they can vary—hence their name: variables.

Variables don't have to vary. You're unlikely to keep changing the form's color all the time. But the user is quite likely to click different titles in the list.

User can select any title listed
User can select any title listed

The opposite of a variable is a literal. Sometimes code contains information that can't vary, such as the date of the moon landing.

You could rewrite the above XPath expression like this, using literal text. Just replace the variable lstTitles.SelectedItem with a specific, literal title such as Beef Tips in Wine Sauce:

   IndividualRecipe = doc.SelectSingleNode("descendant::recipe[title='Beef Tips in Wine Sauce']")

The obvious problem with this approach is that it ties the user's hands. The user could still click any title in the listbox, but your code would ignore that and always just display the instructions for Beef Tips in Wine Sauce. Clearly you can't use literals in situations like this. But can you think of any situation when you would want to use a literal?

Use literals when there's only one possible choice. For example, you wrote the cookbook program so it will always use the same XML file, and that file is always in the same folder on the hard drive:

Dim RecipesFilePath As String = "C:\xml projects\cookbook\recipes.xml"

Using a literal makes sense here. This filepath never varies. But if you later modify the cookbook program to allow the user to choose from several different cookbooks (several different XML documents, in other words), then clearly you couldn't use a literal. The user's choice would vary.

Dealing With Bizarre Punctuation

One more thing about XPath: What's going on with that weird punctuation and the brackets?

 :: and '" and "' and &

You just have to live with this when you write an XPath expression. This is another of those copy-it-and-forget-it situations.

There was a meeting somewhere, and they decided this is how it's going to be. Two colons separate the relationship (descendant) from the elements to search through (recipe). Brackets enclose the description of the target recipe. Peculiar single/double/double/single quotation marks enclose the variable containing the specific target term.

Think of this punctuation as a kind of template that you have to type your search request into. It's like one of those forms where they give you one inch to write your entire address. Sometimes you just have to deal with the template they give you, whether it makes much sense or not!

Chapter 3

Watching Your Code in Slo-Mo

Now that we've picked apart the code in the lstTitles_Click sub, wouldn't it be helpful (not to mention fun) to be able to see this code execute? And when I say see, I mean watch it perform very slowly, like when a football team looks at a replay of a game.

You can actually do this with the VB editor. It's called single-stepping. Every time you press the F11 key, a single line of code executes and then halts.

Single-stepping is a great way to learn how programming works, but it's also a big help in tracking down errors. When debugging, you frequently need to figure out where something went wrong in the code. Which line of programming is bad? Stepping through the program in slow-motion can sometimes help you zero in on the offending code pretty quickly.

Single-stepping looks like the step-stop stroll of a bride coming down the aisle, or the way Greek soldiers march. But for programmers, this is more than just a stylized walk. It can make the difference between rapidly finding a bug versus the frustration of trying to locate an elusive error.

Let's step through the lstTitles_Click sub. You'll be able to see exactly how the VB code carries out your XPath query.

  1. Open the cookbook program in VS.
  2. Switch to the code window.
  3. Locate the lstTitles_Click sub.
  4. Find the first line in this sub: Dim IndividualRecipe As XmlNode.
  5. Click the gray bar to the left of that line of code. A red dot appears, like a little stop sign. This red dot is called a breakpoint. (If the editor displays an error message saying "A breakpoint could not be inserted at this location," press F5 to run the program, click the Exit button to stop the program, and then try clicking the gray bar again.)
 

The red stop sign
Tip

Why Use Breakpoints?

To see how XPath works, you want to single-step inside the lstTitles_Click event. That's where the XPath code is, so that's where you want the program to stop.

You could put the breakpoint in the Form_Load event (or any other sub), but you would have to press F11 about two dozen times to step through Form_Load. Why do that? Right now, we're only interested in the lstTitles_Click event. So the red dot goes in the listbox's click event. That way, when you press F5, execution stops where the XPath business is going on.

Similarly, if you were debugging, you'd put the red dot wherever in the code you suspected the bug is lurking. You'd halt execution at that location and begin your close slo-mo examination of what's going on there.

(You can also add multiple breakpoints, but I've never found a reason to do so.)

  1. Press F5 to run the program. The program now executes at normal speed through the Form_Load event, but it stops in its tracks at the red dot.

    You're now in break mode. VB is still running the program, but it's paused, like when you press the pause button on the DVD remote. To let you know you're in break mode, the editor highlights the current line of code in yellow (the line where the breakpoint is) and also displays Cookbook (Debugging) up in the top-left corner of the code window.

     
    Execution paused on yellow code line
    Execution paused on yellow code line

    Now for something really cool . . .

  2. Hover your mouse pointer over the IndividualRecipe variable in the breakpoint line. The editor will tell you the contents of this variable at this point in execution! Since you haven't yet executed this line, the contents are currently nothing:  
    Hovering cursor and variable contents
    Hovering cursor and variable contents

    Notice in the figure that when in code text, the mouse cursor changes from an arrow to an I-beam.

  3. Try hovering your cursor over the other words in this line of code. Here's what happens:
  • The editor identifies doc as a document, which we already know. But if you click the small black arrow next to doc, you'll open a scrollable list of way more information about doc than you want to know. Since you wrote this XML document, you already understand that it has child nodes, that its root element is <cookbook>, and so on. But this list could come in very handy if you were working with an XML document that someone else wrote!  
    Tons of info about objects
    Tons of info about objects
  • Nothing happens when you hover over SelectSingleNode, descendant, or recipe, because they're not variables. So there's no content to display.
  • But if you hover over lstTitles, you see its contents: "Beef Tips in Wine Sauce." There's also a small black arrow here, and when you click it, it tells you lots about this listbox control—its position, size, and many other behaviors and properties. But for normal VB programming, you can ignore this list.

Taking the Next Step

Now you've seen what's going on in this line before its code executes. How about what happens after the line uses the XPath expression to locate that specific Beef Tips recipe in the doc?

Let's take a look.

  1. Press F11. (If the editor displays a message about an automatic step-over, click No.) The editor now executes the current line of code (yellow) and steps down to the next code line where it halts. When it halts, three things happen in the editor:
  • The next line of code turns yellow to show where we're now paused in the code.
  • The breakpoint line turns back to red.
  • The small yellow arrow on the left moves down to point at the current line.
  1. Hover your mouse pointer over LastChild in line 54. You'll see that the recipe element's last child is correctly identified as <instructions>. Recall that the recipe element has only two children, so its FirstChild is <title>, and the LastChild is <instructions>. You've hit pay dirt here, because the whole point of using this XPath expression is to locate the instructions to copy them into the txtInstructions textbox.
  2. Hover your mouse pointer over InnerText. There they are—the instructions for the Beef Tips recipe!
 
Beef Tips recipe instructions
Beef Tips recipe instructions

Want to see if you did it correctly? Take a look as I walk through the steps:


Chapter 3, Video 1: "Practicing Setting Breakpoints and Single-Stepping Through Code", TRANSCRIPT

Let's set a breakpoint and then also see how to single-step through code.

We'll set a breakpoint here on line 52 and see what happens when we press F5 to execute this program. It breaks here on this line where we set our breakpoint. It also indicates that we're in break mode by using the word "debugging" up here at the top and by various other color cues in the Code window.

You can take a look at the contents of variables or other items by hovering your mouse pointer on top of the variable name. In this case, we can see that the IndividualRecipe variable currently contains nothing. This line of code will, when executed, add some content to that variable. Right now though it has nothing in it.

The doc is identified as a document, and you can also move your mouse pointer down to the little arrow and reveal a great deal of information about the doc object. Most of which you don't have any reason to look at, but sometimes in a difficult debugging situation, this information could be helpful. Frankly, I've never used it.

But let's take a look over here at the lstTitles. It does contain something. It contains "Beef Tips in Wine Sauce," a title. A title from our recipes. And it also has this small arrow that, when you hover your mouse over it, gives you a lot of information about the listbox. Again, I've never needed to use this, but it's there if you want it.

Now we'll single-step. You do that by pressing F11. It moves the line down one. You can always tell where you're located in your code, where you're broken right here, by taking a look at the little yellow arrow over here. This was our breakpoint, but we're down here now.

Okay. What information can we find now in the code? If we hover over the LastChild, we see that this child in the XML is named "instructions," and that's correct. That is the last child of any recipe. Now let's look at the content of this particular element by hovering over the InnerText. And as you can see, it gives us the complete content, the entire instructions for that element.

Single-stepping is extremely valuable both for learning how the code works and also for tracking down bugs.

END TRANSCRIPT



Let's Chat!

The editor includes several debugging tools in addition to breakpoints and single-stepping. Give them a whirl, and see what you think!

  1. Set a breakpoint by clicking the gray bar to the left of a line of code.
  2. Press F5 to execute the program. The program will halt (break) at the breakpoint and enter Debugging mode.
  3. Click the Debug menu to see some of VB's debugging tools.
  4. Try some of these tools out to see what they do. 
  5. Now stop debugging and clean things up by clicking the Debug menu and then choosing Delete All Breakpoints.
Debugging tools
Debugging tools

If you find some of these tools useful, pop into the Discussion Area and share your discoveries.

Okay, time to use XPath to add today's new feature to the cookbook program.

Chapter 4

Programming a Delete Feature

In this chapter, we're going to add a button to the cookbook project that lets the user delete a recipe. And to make this happen, we'll use XPath.

Note icon

Note

If you don't need editor practice (copying and pasting code or adding controls), double-click the cookbook.sln program in the C:\XML Projects L6 Finished\Cookbook folder. It contains all the code and controls as they are at the end of today's lesson. You can now skip down to the green dot in the margin below.

If you do want a bit more practice working hands-on with the editor, follow the next steps.

Follow these steps to add a Delete feature:

  1. Start the cookbook project in VS. Either double-click the desktop shortcut you created in Lesson 3, or double-click the cookbook.sln file in the C:\XML Projects\Cookbook folder.
  2. Switch to the Design window (SHIFT + F7).
  3. Right-click the Exit button, and from the context menu, choose Copy.
  4. Press CTRL + V to paste the copied button on the form. (When you copy and paste, the new control inherits properties from the copied control. This simplifies things for you because you don't have to fiddle around making the new button look like the Exit button. We want them the same size, shape, and font size. Looks better that way!)
  5. Drag the new button to align it with the Exit button, but move it up somewhere near the middle of the form above the Exit button.
  6. Use the Properties window to change the new button's Name property to btnDelete. 
  7. In the Properties window, change the new button's Text property to Delete.
  8. Double-click the Delete button to enter the Code window.
  9. Click your mouse in the editor's Code window on the blank line just above the Delete button's End Sub code line. This positions the blinking insertion cursor where you want to paste the following new code.
  10. Drag your mouse over the following code to select it here, and then press CTRL + C to copy it:

  1. In the Code window, press CTRL + V to paste this code where you clicked to set the blinking insertion cursor.

Adding Test Dummy Recipes

Don't press F5 just yet to test this delete feature. You don't want to delete one of the three good recipes currently in the recipes.xml file. Instead, locate the recipes.xml file in the C:\XML L6 Finished\Cookbook folder. That file includes some dummy recipes.

Replace your existing recipes.xml with the new version by following these steps:

  1. Right-click the new recipes.xml file, and choose Copy from the context menu.
  2. Navigate to your C:\XML Projects\Cookbook folder (where the old recipes.xml is stored).
  3. Press CTRL + V to paste the new recipes.xml file into the C:\XML Projects\Cookbook folder. A dialog box opens in Windows Explorer.
  4. Click Replace the file in the destination. And now you're ready to test the delete feature.
  5. Press F5 in the editor to run the cookbook program. It looks like this with the two new test dummy recipes:
Dummies to delete
Dummies to delete
  1. 7. Try clicking the Delete button to make one of the dummy recipes disappear.

Did it work? Good. Now, let's take a few minutes to examine the code.

Looking at the Code

The code that deletes a recipe in this lesson contains RemoveChild and Save methods—new VB commands that we should briefly examine.

The actual recipe deletion code is pretty straightforward. The first line of code uses the SelectSingleNode command to fetch the target recipe element—the recipe currently selected (highlighted) in the listbox. This is the same XPath expression that we discussed earlier in this lesson:

IndividualRecipe = doc.SelectSingleNode("descendant::recipe[title='" & lstTitles.SelectedItem & "']")

The next code line uses the RemoveChild method to delete the target recipe element from the XML document:

IndividualRecipe.ParentNode.RemoveChild(IndividualRecipe)

Notice that the ParentNode in this case is <cookbook>, the root element in the recipes.xml file.

Then you save the document (with the target recipe now deleted), automatically overwriting the existing recipes.xml file on the hard drive:

doc.Save(RecipesFilePath)

So far so good. Now all that's left for you to do is to remove the deleted recipe's title from the listbox.

I ran into a little problem, though, while writing this next code. I solved it by calling on my reliable old friend, single-stepping. In a listbox, each item is identified by its unique index number. So the code starts by storing the deleted title's index number in the variable n:

Dim n As Integer = lstTitles.SelectedIndex

Then you use the RemoveAt(n) command to erase this title from the listbox:

lstTitles.Items.RemoveAt(n)

But when I tested this code by deleting random recipes, I found out that everything works fine unless the user deletes the last recipe in the list. That crashes the program—it stops running and displays a most peculiar message:

Unhandled exception notice
Unhandled exception notice

Crashing almost always frightens users. But program crashes are digital events, happening only in Matrix space. They can no more harm real-world hardware than an explosion in a TV movie can harm the TV itself.

I single-stepped through this code to figure out what was going wrong. The problem turned out to be the index number in variable n. When the RemoveAt command erases the last title in the listbox, that index number actually no longer exists. Here's an example: Say there are four recipes listed, and you delete the fourth. The problem is that n still points to the, now nonexistent, "fourth recipe." VB won't deal with ghosts. It crashes if you tell it to do something with a title that doesn't exist, just as you would stop short if somebody said "look at that chicken," and there was no chicken.

So I solved the problem by subtracting 1 from n  any time the user deletes the final title:

If n = lstTitles.Items.Count Then n = n - 1

This has the effect of moving the index number down to a title that does exist in the list, which VB much prefers.

I didn't foresee this problem when I wrote the code. I only noticed it while I was trying out the Delete button. And I'm sure I've had to fix this exact same bug several times while programming during the past 30 years.

So the moral is this: You often won't get your code right the first time, so you need to try out your program for a while, using its various features, and fixing the bugs before handing the program to someone else (someone who might freak out if it crashes).

XML Challenge!!




Think you've got XPath down? Try your hand at this challenge:


Chapter 5

Summary

Today you mastered the basics of XPath, XML's search language! You saw how to use an XPath expression to look through your recipes.xml file, locating a specific recipe based on its title. Then you copied that recipe's instructions into the textbox.

You also learned the difference between literals and variables. You found out why you shouldn't use a literal like "Beef Tips in Wine Sauce" in an XPath expression—the program could never search for any other recipe! And you now also understand where a literal can be useful in programming—for a filename if that file is the only one ever used by the program.

You found out how to halt a running program with breakpoints. And you practiced watching the contents of variables change as you viewed your code step-by-step in slow motion using the editor's single-stepping feature.  

Then you added a Delete button to the cookbook program. We used XPath to locate the recipe that the user wants to delete and the RemoveChild command to actually delete it from the XML document. And we also erased the recipe's title from the listbox, after a brief struggle.

Be sure to work on the assignment for this lesson and check out the FAQs and Supplementary Material section. After you take the quiz, you can move on to the next lesson, where you'll learn how to transform an XML document with the XSLT language. See you there!

Supplementary Material

http://courses.ischool.berkeley.edu/i290-14/s05/lecture-4/allslides.html
http://msdn.microsoft.com/en-us/library/ms256471(v=vs.110).aspx

FAQs

Q: Are there other XPath terms besides descendant?

A: You bet. Dozens! XPath allows all kinds of searches. For example, descendant describes an "offspring" relationship between a parent element and all of its children, grandchildren, great-grandchildren, and so on. But you can also specify other kinds of XPath relationships: ancestor (the opposite of descendant), parent, child, and sibling.

And it doesn't stop with just relationships. XPath allows you to also:

  • Use wildcards like * (You can use the asterisk wildcard to mean "match anything." For example, the expression recipes/* will match both recipes/title and recipes/instructions.)
  • Specify all kinds of locations in the document tree, such as the second-to-last element [last()-1]
  • Select multiple paths at the same time
  • Look for attributes and other kinds of nodes, such as comments
  • Use operators like or and less than ( < )
  • Do some simple computation, using functions like concat and substring (it's like VB's InStr command)

If you're interested in looking more deeply into XPath's capabilities, visit the Web pages in the Supplementary Material section.

 

Assignment

Single-stepping is both fun and instructive. Try single-stepping through the Form_Load event in the cookbook program, and watch how the following variables change during program execution:

  1. Put a breakpoint on this line by clicking in the gray bar to the left of the line:

    doc = New XmlDocument()

    You can skip breaking on one of the Dim statements above this line—they don't change any variable's values.

  2. Press F5 to run the program. It halts at your breakpoint.
  3. Hover your mouse pointer over the doc variable. Notice that because you haven't executed this line yet, VB reports that doc contains nothing.
  4. Press F11 to step through this line of code. Now hover your mouse over that same doc again. You see doc described as a document. What you've just seen is doc being brought into existence as an XML document object, capable of holding your entire recipes.xml file's contents. The following code line copies the recipes from the hard drive into doc:

    doc.Load(RecipesFilePath)

  5. Press F11 to load the XML into doc.
  6. Hover over the RecipesFilePath variable. You declared it (Dim) up at the top of the code window (and given the recipes.xml filepath). That way, any sub in your form can access and use RecipesFilePath.
  7. Now try something a little different. VB includes an alternative to mouse hovering—you can add watches for any variables in the code. Look down a little further in your code until you find the For Each loop. Right-click RecipeTitle, and then click the Add Watch option in the context menu.
Add Watch for RecipeTitle variable
Add Watch for RecipeTitle variable
  1. Now look at the bottom of the code window. You'll see that the editor has added a Watch window. This window shows that the RecipeTitle variable currently has nothing in it.
New Watch window down at bottom
New Watch window down at bottom
  1. Keep pressing F11 to enter the For Each loop. As you continue to press F11 to loop through all the recipes, watch what happens in the Watch window.

This is a great learning experience—and it's also pretty cool. You can now see exactly why variables are called variables. And you get to watch an actual loop looping, repeating its code over and over until there's nothing left in the collection it's working with. At that point, execution moves down to execute the first code line below the loop.